Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory76.0 B

Variable types

NUM9
CAT1

Reproduction

Analysis started2020-08-16 08:00:14.047470
Analysis finished2020-08-16 08:01:02.280228
Duration48.23 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct count844
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.56970445736432
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:02.658250image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
2020-08-16T13:31:03.037272image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-118.311620.8%
 
-118.31600.8%
 
-118.291480.7%
 
-118.271440.7%
 
-118.321420.7%
 
-118.281410.7%
 
-118.351400.7%
 
-118.361380.7%
 
-118.191350.7%
 
-118.251280.6%
 
Other values (834)1920293.0%
 
ValueCountFrequency (%) 
-124.351< 0.1%
 
-124.32< 0.1%
 
-124.271< 0.1%
 
-124.261< 0.1%
 
-124.251< 0.1%
 
ValueCountFrequency (%) 
-114.311< 0.1%
 
-114.471< 0.1%
 
-114.491< 0.1%
 
-114.551< 0.1%
 
-114.561< 0.1%
 

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count862
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143410853
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:03.378291image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
2020-08-16T13:31:03.700310image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34.062441.2%
 
34.052361.1%
 
34.082341.1%
 
34.072311.1%
 
34.042211.1%
 
34.092121.0%
 
34.022081.0%
 
34.12031.0%
 
34.031930.9%
 
33.931810.9%
 
Other values (852)1847789.5%
 
ValueCountFrequency (%) 
32.541< 0.1%
 
32.553< 0.1%
 
32.5610< 0.1%
 
32.57180.1%
 
32.58260.1%
 
ValueCountFrequency (%) 
41.952< 0.1%
 
41.921< 0.1%
 
41.881< 0.1%
 
41.863< 0.1%
 
41.841< 0.1%
 

housing_median_age
Real number (ℝ≥0)

Distinct count52
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.639486434108527
Minimum1.0
Maximum52.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:04.031329image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
2020-08-16T13:31:04.361348image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5212736.2%
 
368624.2%
 
358244.0%
 
167713.7%
 
176983.4%
 
346893.3%
 
266193.0%
 
336153.0%
 
185702.8%
 
255662.7%
 
Other values (42)1315363.7%
 
ValueCountFrequency (%) 
14< 0.1%
 
2580.3%
 
3620.3%
 
41910.9%
 
52441.2%
 
ValueCountFrequency (%) 
5212736.2%
 
51480.2%
 
501360.7%
 
491340.6%
 
481770.9%
 

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5926
Unique (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.7630813953488
Minimum2.0
Maximum39320.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:04.692367image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
2020-08-16T13:31:05.042387image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1527180.1%
 
1613170.1%
 
1582170.1%
 
2127160.1%
 
1703150.1%
 
1471150.1%
 
2053150.1%
 
1722150.1%
 
1607150.1%
 
1717150.1%
 
Other values (5916)2048299.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
61< 0.1%
 
81< 0.1%
 
111< 0.1%
 
121< 0.1%
 
ValueCountFrequency (%) 
393201< 0.1%
 
379371< 0.1%
 
326271< 0.1%
 
320541< 0.1%
 
304501< 0.1%
 

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct count1923
Unique (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525375618
Minimum1.0
Maximum6445.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:05.396407image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
2020-08-16T13:31:05.746427image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
280550.3%
 
331510.2%
 
345500.2%
 
393490.2%
 
343490.2%
 
394480.2%
 
328480.2%
 
348480.2%
 
272470.2%
 
309470.2%
 
Other values (1913)1994196.6%
 
(Missing)2071.0%
 
ValueCountFrequency (%) 
11< 0.1%
 
22< 0.1%
 
35< 0.1%
 
47< 0.1%
 
56< 0.1%
 
ValueCountFrequency (%) 
64451< 0.1%
 
62101< 0.1%
 
54711< 0.1%
 
54191< 0.1%
 
52901< 0.1%
 

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3888
Unique (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.4767441860465
Minimum3.0
Maximum35682.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:06.087446image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
2020-08-16T13:31:06.450467image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
891250.1%
 
761240.1%
 
1227240.1%
 
850240.1%
 
1052240.1%
 
825230.1%
 
999220.1%
 
782220.1%
 
1005220.1%
 
781210.1%
 
Other values (3878)2040998.9%
 
ValueCountFrequency (%) 
31< 0.1%
 
51< 0.1%
 
61< 0.1%
 
84< 0.1%
 
92< 0.1%
 
ValueCountFrequency (%) 
356821< 0.1%
 
285661< 0.1%
 
163051< 0.1%
 
161221< 0.1%
 
155071< 0.1%
 

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1815
Unique (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802325581
Minimum1.0
Maximum6082.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:06.796487image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
2020-08-16T13:31:07.148507image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
306570.3%
 
386560.3%
 
335560.3%
 
282550.3%
 
429540.3%
 
375530.3%
 
284510.2%
 
297510.2%
 
362500.2%
 
380500.2%
 
Other values (1805)2010797.4%
 
ValueCountFrequency (%) 
11< 0.1%
 
23< 0.1%
 
34< 0.1%
 
44< 0.1%
 
57< 0.1%
 
ValueCountFrequency (%) 
60821< 0.1%
 
53581< 0.1%
 
51891< 0.1%
 
50501< 0.1%
 
49301< 0.1%
 

median_income
Real number (ℝ≥0)

Distinct count12928
Unique (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8706710029069766
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:07.496527image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
2020-08-16T13:31:07.837547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.125490.2%
 
15.0001490.2%
 
2.875460.2%
 
4.125440.2%
 
2.625440.2%
 
3.875410.2%
 
3380.2%
 
3.375380.2%
 
3.625370.2%
 
4370.2%
 
Other values (12918)2021798.0%
 
ValueCountFrequency (%) 
0.4999120.1%
 
0.53610< 0.1%
 
0.54951< 0.1%
 
0.64331< 0.1%
 
0.67751< 0.1%
 
ValueCountFrequency (%) 
15.0001490.2%
 
152< 0.1%
 
14.90091< 0.1%
 
14.58331< 0.1%
 
14.42191< 0.1%
 

median_house_value
Real number (ℝ≥0)

Distinct count3842
Unique (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.81690891474
Minimum14999.0
Maximum500001.0
Zeros0
Zeros (%)0.0%
Memory size161.2 KiB
2020-08-16T13:31:08.189567image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816e+10
2020-08-16T13:31:08.529586image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000019654.7%
 
1375001220.6%
 
1625001170.6%
 
1125001030.5%
 
187500930.5%
 
225000920.4%
 
350000790.4%
 
87500780.4%
 
275000650.3%
 
150000640.3%
 
Other values (3832)1886291.4%
 
ValueCountFrequency (%) 
149994< 0.1%
 
175001< 0.1%
 
225004< 0.1%
 
250001< 0.1%
 
266001< 0.1%
 
ValueCountFrequency (%) 
5000019654.7%
 
500000270.1%
 
4991001< 0.1%
 
4990001< 0.1%
 
4988001< 0.1%
 

ocean_proximity
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.6 KiB
<1H OCEAN
9136
INLAND
6551
NEAR OCEAN
2658
NEAR BAY
2290
ISLAND
 
5
ValueCountFrequency (%) 
<1H OCEAN913644.3%
 
INLAND655131.7%
 
NEAR OCEAN265812.9%
 
NEAR BAY229011.1%
 
ISLAND5< 0.1%
 
2020-08-16T13:31:09.173623image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Interactions

2020-08-16T13:30:27.014212image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:27.652248image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:28.026269image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:28.409292image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:28.801314image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:29.175335image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:29.571358image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:29.960380image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:30.350402image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:30.719424image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:31.080444image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:31.428464image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:31.794485image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:32.184507image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:32.552528image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:32.925550image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:33.289571image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:33.669592image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:34.040613image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:34.418635image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:34.788656image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:35.162678image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:35.557700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:35.942722image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:36.328744image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:36.713766image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:37.104789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:37.715824image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:38.126847image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:38.522870image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:38.915892image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:39.335916image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:39.745940image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:40.168964image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:40.582988image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:40.994011image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:41.415035image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:41.801057image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:42.178079image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:42.564101image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:42.964124image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:43.353146image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:43.750169image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:44.131191image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:44.532213image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:44.927236image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:45.314258image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:45.690280image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:46.082302image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:46.498326image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:46.899349image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:47.297372image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:47.695395image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:48.101418image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:48.674450image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:49.061472image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:49.437494image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:49.817516image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:50.228539image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:50.626562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:51.026585image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:51.409607image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:51.810630image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:52.201652image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:52.596675image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:52.985697image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:53.368719image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:53.777742image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:54.179765image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:54.591789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:54.982811image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:55.389834image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:55.783857image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:56.163879image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:56.528900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:56.896921image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:57.298944image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:57.689966image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:58.088989image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:58.467010image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:30:58.854033image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-16T13:31:09.540644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-16T13:31:10.140678image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-16T13:31:10.836718image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-16T13:31:11.375749image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-16T13:30:59.829088image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:31:00.632134image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-16T13:31:02.043215image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
5-122.2537.8552.0919.0213.0413.0193.04.0368269700.0NEAR BAY
6-122.2537.8452.02535.0489.01094.0514.03.6591299200.0NEAR BAY
7-122.2537.8452.03104.0687.01157.0647.03.1200241400.0NEAR BAY
8-122.2637.8442.02555.0665.01206.0595.02.0804226700.0NEAR BAY
9-122.2537.8452.03549.0707.01551.0714.03.6912261100.0NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.2911.02640.0505.01257.0445.03.5673112000.0INLAND
20631-121.4039.3315.02655.0493.01200.0432.03.5179107200.0INLAND
20632-121.4539.2615.02319.0416.01047.0385.03.1250115600.0INLAND
20633-121.5339.1927.02080.0412.01082.0382.02.549598300.0INLAND
20634-121.5639.2728.02332.0395.01041.0344.03.7125116800.0INLAND
20635-121.0939.4825.01665.0374.0845.0330.01.560378100.0INLAND
20636-121.2139.4918.0697.0150.0356.0114.02.556877100.0INLAND
20637-121.2239.4317.02254.0485.01007.0433.01.700092300.0INLAND
20638-121.3239.4318.01860.0409.0741.0349.01.867284700.0INLAND
20639-121.2439.3716.02785.0616.01387.0530.02.388689400.0INLAND